28 research outputs found

    SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

    Get PDF
    This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, V玫ro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Ash谩ninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems' predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems' performance on previously unseen lemmas.Peer reviewe

    Cell-based screen for altered nuclear phenotypes reveals senescence progression in polyploid cells after Aurora kinase B inhibition.

    Get PDF
    Cellular senescence is a widespread stress response and is widely considered to be an alternative cancer therapeutic goal. Unlike apoptosis, senescence is composed of a diverse set of subphenotypes, depending on which of its associated effector programs are engaged. Here we establish a simple and sensitive cell-based prosenescence screen with detailed validation assays. We characterize the screen using a focused tool compound kinase inhibitor library. We identify a series of compounds that induce different types of senescence, including a unique phenotype associated with irregularly shaped nuclei and the progressive accumulation of G1 tetraploidy in human diploid fibroblasts. Downstream analyses show that all of the compounds that induce tetraploid senescence inhibit Aurora kinase B (AURKB). AURKB is the catalytic component of the chromosome passenger complex, which is involved in correct chromosome alignment and segregation, the spindle assembly checkpoint, and cytokinesis. Although aberrant mitosis and senescence have been linked, a specific characterization of AURKB in the context of senescence is still required. This proof-of-principle study suggests that our protocol is capable of amplifying tetraploid senescence, which can be observed in only a small population of oncogenic RAS-induced senescence, and provides additional justification for AURKB as a cancer therapeutic target.This work was supported by the University of Cambridge, Cancer Research UK, Hutchison Whampoa; Cancer Research UK grants A6691 and A9892 (M.N., N.K., C.J.T., D.C.B., C.J.C., L.S.G, and M.S.); a fellowship from the Uehara Memorial Foundation (M.S.).This is the author accepted manuscript. The final version is available from the American Society for Cell Biology via http://dx.doi.org/10.1091/mbc.E15-01-000

    UniMorph 4.0:Universal Morphology

    Get PDF

    UniMorph 4.0:Universal Morphology

    Get PDF
    The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet
    corecore